Saket Choudhary skchoudh@usc.edu
In [15]:
using Distributions;
using Gadfly;
srand(1000);
In [2]:
## Write a routine to simulate a fair
## Bernoulli trial in your language of choice.
function simulate_bernoulli(p, N_trials)
rand(Bernoulli(p), N_trials)
end;
## Write a routine to count the number
## of successes in 5 fair Bernoulli trials.
function count_successes_5_bern()
sum(rand(Bernoulli(0.5), 5))
end;
## Write a routine to count the number of
## trials before the first successful Bernoulli trial.
function count_before_success()
trials = 0
while rand(Bernoulli()) < 1
trials+=1
end
return trials
end;
In [3]:
## Generate a histogram for 100
## simulated Bernoulli trials
trials = simulate_bernoulli(0.5, 100);
plot(x=trials, Geom.histogram,
Guide.xlabel("Number of successes"),
Guide.ylabel("Frequency"),
Guide.title("Distribution of number of successes in one bernoulli trial"))
Out[3]:
Number of successes in 5 bernoulli trials is distributed as Binomial(5,0.5) and with mean = $np = 2.5$
In [4]:
## Generate a histogram for 100
## samples of this counting random.
trials = [count_successes_5_bern() for i=1:100];
plot(x=trials, Geom.histogram,
Guide.xlabel("Number of successes"),
Guide.ylabel("Frequency"),
Guide.title("Distribution of number of successes in five bernoulli trials"))
Out[4]:
Let $X$ represent the associated random variable. If the number of trials before first success be $k$, so for $k-1$ trials the bernoulli trial is a failure.
Then, $P(X=k)=(1-p)^{k-1}p$ $k \in \{1,2,\dots,\}$ i.e. $X$ is a geometric random variable with mean $\frac{1}{p}=2$
In [5]:
## Generate a histogram for 100
## samples of this counting random variable.
trials = [count_before_success() for i=1:100];
plot(x=trials, Geom.histogram,
Guide.xlabel("Number of trials"),
Guide.ylabel("Frequency"),
Guide.title("Distribution of number of trials before first success"))
Out[5]:
In [6]:
function count_successes_5_bern_sumk(k)
sums = 0
for i=1:k
sums+= count_successes_5_bern()
end
return sums
end;
In [7]:
k = 2
sums_samples = [count_successes_5_bern_sumk(k)/k for i=1:300]
plot(x=sums_samples, Geom.histogram,
Guide.xlabel("Mean"),
Guide.ylabel("Frequency"),
Guide.title("Distribution of mean of k=2 trials before first success"))
Out[7]:
In [8]:
k = 5
sums_samples = [count_successes_5_bern_sumk(k)/k for i=1:300]
plot(x=sums_samples, Geom.histogram,
Guide.xlabel("Mean"),
Guide.ylabel("Frequency"),
Guide.title("Distribution of mean of k=5 trials before first success"))
Out[8]:
In [9]:
k = 10
sums_samples = [count_successes_5_bern_sumk(k)/k for i=1:300]
plot(x=sums_samples, Geom.histogram,
Guide.xlabel("Mean"),
Guide.ylabel("Frequency"),
Guide.title("Distribution of mean of k=10 trials before first success"))
Out[9]:
In [10]:
k = 30
sums_samples = [count_successes_5_bern_sumk(k)/k for i=1:300]
plot(x=sums_samples, Geom.histogram,
Guide.xlabel("Mean"),
Guide.ylabel("Frequency"),
Guide.title("Distribution of mean of k=30 trials before first success"))
Out[10]:
In [11]:
k = 50
sums_samples = [count_successes_5_bern_sumk(k)/k for i=1:300]
plot(x=sums_samples, Geom.histogram,
Guide.xlabel("Mean"),
Guide.ylabel("Frequency"),
Guide.title("Distribution of mean of k=30 trials before first success"))
Out[11]:
In [12]:
data = readcsv("data/NJGAS.dat");
μ = mean(data);
In [13]:
N_bootstraps = 1000;
n_samples_to_draw = 20;
bootstrap_mean = sort([mean(sample(data, n_samples_to_draw, replace=true)) for i=1:N_bootstraps]);
bootstrap_quantiles = quantile(bootstrap_mean,[0.25, 0.75]);
In [14]:
print(bootstrap_quantiles)